Online checkpointing with improved worst-case guarantees
نویسندگان
چکیده
In the online checkpointing problem, the task is to continuously maintain a set of k checkpoints that allow to rewind an ongoing computation faster than by a full restart. The only operation allowed is to replace an old checkpoint by the current state. Our aim are checkpoint placement strategies that minimize rewinding cost, i.e., such that at all times T when requested to rewind to some time t ≤ T the number of computation steps that need to be redone to get to t from a checkpoint before t is as small as possible. In particular, we want that the closest checkpoint earlier than t is not further away from t than qk times the ideal distance T/(k + 1), where qk is a small constant. Improving over earlier work showing 1 + 1/k ≤ qk ≤ 2, we show that qk can be chosen asymptotically less than 2. We present algorithms with asymptotic discrepancy qk ≤ 1.59 + o(1) valid for all k and qk ≤ ln(4) + o(1) ≤ 1.39 + o(1) valid for k being a power of two. Experiments indicate the uniform bound pk ≤ 1.7 for all k. For small k, we show how to use a linear programming approach to compute good checkpointing algorithms. This gives discrepancies of less than 1.55 for all k < 60. We prove the first lower bound that is asymptotically more than one, namely qk ≥ 1.30− o(1). We also show that optimal algorithms (yielding the infimum discrepancy) exist for all k.
منابع مشابه
Cooperative Checkpointing for Supercomputing Systems
A system-level checkpointing mechanism, with global knowledge of the state and health of the machine, can improve performance and reliability by dynamically deciding when to skip checkpoint requests made by applications. This thesis presents such a technique, called cooperative checkpointing, and models its behavior as an online algorithm. Where C is the checkpoint overhead and I is the request...
متن کاملOnline and Random-order Load Balancing Simultaneously
We consider the problem of online load balancing under lp-norms: sequential jobs need to be assigned to one of the machines and the goal is to minimize the lp-norm of the machine loads. This generalizes the classical problem of scheduling for makespan minimization (case l∞) and has been thoroughly studied. However, despite the recent push for beyond worst-case analyses, no such results are know...
متن کاملDesigning smoothing functions for improved worst-case competitive ratio in online optimization
Online optimization covers problems such as online resource allocation, online bipartite matching, adwords (a central problem in e-commerce and advertising), and adwords with separable concave returns. We analyze the worst case competitive ratio of two primal-dual algorithms for a class of online convex (conic) optimization problems that contains the previous examples as special cases defined o...
متن کاملExploiting easy data in online optimization
We consider the problem of online optimization, where a learner chooses a decision from a given decision set and suffers some loss associated with the decision and the state of the environment. The learner’s objective is to minimize its cumulative regret against the best fixed decision in hindsight. Over the past few decades numerous variants have been considered, with many algorithms designed ...
متن کاملThe Best of Both Worlds: Stochastic and Adversarial Bandits
We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal) whose regret is (essentially) optimal both for adversarial rewards and for stochastic rewards. Specifically, SAO combines the O( √ n) worst-case regret of Exp3 (Auer et al., 2002b) and the (poly)logarithmic regret of UCB1 (Auer et al., 2002a) for stochastic rewards. Adversarial rewards and stochastic rewards are the two...
متن کامل